
Any individual who has attempted to run a heavy script or model-train on an ordinary laptop is aware of the pain. The fan begins to cry, the screen is frozen a bit and you wish that your machine will not issue a total shutdown. Years upon years, it was the machine which was on your desk which told you what you could and could not do. In case you required additional power, you were required to purchase a powerful computer.
Then there was the era of Cloud, and it modified the working mode.
The name can be sounding soft and dreamy, but the concept has a lot of sense. It is common knowledge to make jokes that the cloud is just another computer of someone else but this is not the real value. The electricity grid is more comparable. You do not construct a power station to switch on your lights. You plug-in and only pay what you use. Cloud services operate in the same manner. Rather than having to acquire their own servers and costly hardware, you pay the required computing power on a pay-as-you-use basis to AWS, Azure, or Google Cloud.
This development modified the trend of contemporary data science. This is the mechanics of it and why it has become a necessity.
Put aside the technical terms and cloud computing can simply be described as the provision of computing services via the internet. This means that instead of keeping all the data on your own personal computer you have remote servers that will take care of the storage, processing and networking of your data.
Cloud services usually fall into three simple categories:
storage of files and data such as Google drive or Dropbox.
Calculate: Rent out the processing units to carry out calculations.
Networking: The transfer of data between point A and B in a highly efficient manner.
Such arrangement enables individuals and corporations to use powerful computing means without spending on holding hardcopy machine.
A diagram of different types of cloud computing
AI-generated content may be incorrect.
Data science is concerned with deriving insights on data. Previously, data sets were small, and running them did not require special computers. We are now confronted with huge bodies of information which are swelling by the second. Big Data is too big to be handled by the use of personal machines.
This is why the cloud has become a necessity to any person involved in data science.
The problem of scale
Consider a recommendation system such as the one that Netflix follows. It must study billions of views and interactions. That cannot be done by any personal computer. It would even crash loading a fraction of the dataset. This is solved by the cloud with a feature known as elasticity. In case a project requires one hundred virtual servers, which will be required within an hour, you can turn them on and then turn them off after the required hours. You only pay for that hour. The purchase of physical servers a hundred would be very expensive and would not be sensible when it comes to short time work
Availability of high-end hardware, such as GPUs
Machine learning models are very expensive in terms of computation. Normal computer processors are not intended to do such kind of mathematical workload. That is where GPUs come in. the GPUs of high quality are quite costly. The cloud permits any individual to rent them at a little expense and educate models, which otherwise would need a supercomputer. Larger companies can offer students, researchers and small teams the same level of work by renting the hardware they require.
More effective teamwork and reduced headaches
Any developer is familiar with the annoyance of getting the answer to the question: It works on my machine. This can be prevented with help of cloud platforms. They enable teams to share environments and tools hence all of them operate on the same setup. Some technologies such as Docker allow the data scientist to create stable environments that act consistently across the entire team. Rather than transmitting files in both directions, all of them operate on one source of truth that is stored in the cloud. This enhances collaboration and eliminates errors.
Cost reduction was not the only thing that cloud computing did. It transformed the ways to ask scientific questions by people. It is no longer concerned whether you possess the hardware to execute something. Now it is about what you would like to explore and the speed at which you can experiment with it. In data science, the cloud is not a repository location. It is what drives the whole field. Without it, the booming progress of the contemporary data driven work would not have been possible.